Machine Learning in Pharmaceutical Research: Data Clustering, Why so and How so
نویسنده
چکیده
Background: In clinical data subgroups can sometimes be identified using regression analysis of subgroup characteristics against some outcome variable, but in data samples without an available outcome variable cluster analysis is a suitable alternative. It is based on the concept that patients with closely related characteristics may also be more related in other fields like prognoses and treatment efficacies. Objective: To compare the performance of three different cluster methodologies, hierarchical , k-means, and density-based clustering. Methods: A simulated data example of fifty patients with mental depression was used. Results: Each cluster methodology identified three clusters. However, the cluster patterns were very different. The hierarchical method showed round patterns different in size, the k-means method round patterns equal in size, and the density-based method non-circular patterns also different in size. The patterns from the hierarchical method were better in agreement with the patterns as clinically expected, than those from the other methods. Conclusions: 1. Cluster analysis is little used in clinical research. 2. Hierarchical cluster is adequate if subgroups in the data are expected to be different in size but, otherwise, Gaussian-like. It is available in the module Classify of SPSS. 3. K-means cluster analysis is adequate if subgroups are expected to be approximately similar in size. It is also available in the module Classify of SPSS. 4. Density-based cluster analysis is adequate if small outlier groups between an, otherwise, homogeneous population is expected. It is not available in SPSS, but an interactive JAVA Applet is freely obtainable at the Internet.
منابع مشابه
Diagnosis of Heart Disease Based on Meta Heuristic Algorithms and Clustering Methods
Data analysis in cardiovascular diseases is difficult due to large massive of information. All of features are not impressive in the final results. So it is very important to identify more effective features. In this study, the method of feature selection with binary cuckoo optimization algorithm is implemented to reduce property. According to the results, the most appropriate classification fo...
متن کاملImproving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features
Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملUsing Machine Learning for Exploratory Data Analysis
This tutorial will introduce attendees to fundamental concepts in the clustering and dimensionality reduction fields of unsupervised machine learning. Attendees will learn about the assumptions algorithms make and how those assumptions can cause the algorithms to be more or less suited to particular datasets. Hands-on interaction with machine learning algorithms on real and synthetic data are a...
متن کاملWHY AND HOW TO APPLY QUANTUM LEARNING AS A NEW APPROACH TO IMPLEMENTATION THE CURRICULUM
The present study was philosophical and analytical research that examines quantum learning as an effective approach to the curriculum in a qualitative way. It explored books, published essays, and related studies, and took some advantages of online materials on the issue from domestic and foreign sources. Because of large body of data on the issue, only the relevant information was included. Da...
متن کامل